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SUMMARY 


The nucleotide sequence determination of the genome of the Beaudette strain of the 
coronavirus avian infectious bronchitis virus (IBV) has been completed. The complete 
sequence has been obtained from 17 overlapping cDNA clones, the 5’-most of which 
contains the leader sequence (as determined by direct sequencing of the genome) and 
the 3’-most of which contains the poly(A) tail. Approximately 8 kilobases at the 3’ end 
of this sequence have already been published. These contain the sequences of mRNAs 
A to E within which are the genes for the spike, the membrane and the nucleocapsid 
polypeptides: the main structural components of the virion. The remainder of the 
sequence, equivalent to the ‘unique’ region of mRNA F, is some 20 kilobases in length 
and is thought to code for a polymerase or polymerases which are involved in the 
replication of the genome and the production of the subgenomic messenger RNAs. 
This sequence contains two large open reading frames, potentially coding for 
polypeptides of molecular weights 441000 and 300000. Unlike other large open 
reading frames in the virus, the 300000 open reading frame appears to have no 
subgenomic RNA associated with it which would allow it to be at the 5’ end of an 
mRNA species. Because of this, and because of the characteristics of the sequence in 
the region immediately upstream of its start codon, other mechanisms of translation, 
such as ribosome slippage, must be postulated. 


INTRODUCTION 


Avian infectious bronchitis virus (IBV) is the type species of the family Coronaviridae 
(Siddell et a/., 1983a). Coronaviruses are enveloped, pleomorphic particles with a distinctive 
‘corona’ of club-shaped surface projections, and a large single-stranded RNA genome of positive 
polarity (Siddell et a/., 19835). In infected cells, in addition to genome-sized RNA, a number of 
subgenomic RNAs can be detected which have a common 3’ terminus, but extend for different 
lengths in the 5’ direction, forming a nested set (Stern & Kennedy, 1980a, 6; Leibowitz et al., 
1981). In the case of IBV these are designated mRNAs A to F, mRNA A being the smallest and 
mRNA F being of genome length. /n vitro translation studies have demonstrated that mRNAs 
A, Cand E code for the nucleocapsid polypeptide, the membrane polypeptide and the precursor 
polypeptide to the spike or surface projection respectively (Stern & Sefton, 1984). These three 
polypeptides form the three known structural proteins of coronavirus virions (Cavanagh, 1981). 
Sequencing of cDNA clones derived from IBV genomic RNA has shown that, in the case of 
mRNAs A, C and E, only the 5’ region of each mRNA which is not present in the next smallest 
mRNA is translated (Boursnell et a/., 1985a, 1984; Binns ef a/., 1985b). This region is often 
referred to, for convenience, as the ‘unique’ region of the particular mRNA. For mRNAs B and 
D the situation is more complicated in that each mRNA has more than one open reading frame 
(ORF) and also has ORFs overlapping the next smallest mRNA (Boursnell & Brown, 1984; 
Boursnell et al., 19856). 

The genome of IBV is infectious (Lomniczi, 1977) indicating that it has a messenger function. 
There is also no evidence for a virion-associated RNA polymerase (Schochetman et a/., 1977). 
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On entry into the cell therefore the virion RNA probably codes for a polymerase, the gene for 
which must lie in the large 5’ region of the genome, the ‘unique’ region of mRNA F, which does 
not contain the genes for the structural polypeptides. This polymerase would then be used to 
synthesize a negative-stranded template. The negative strand could then be used by another 
polymerase, or a modified form of the same polymerase, to produce the subgenomic mRNAs 
and virion RNA. Both the negative strand and two distinct polymerase activities have been 
detected in cells infected with the coronavirus mouse hepatitis virus (MHV) (Lai et ai., 1982; 
Brayton et al., 1982). Translation of MHV virion RNA in reticulocyte lysates produced three 
structurally related polypeptides of molecular weights greater than 200000 (200K) (Leibowitz et 
al., 1982). 

In this paper we present the nucleotide sequence, obtained from cDNA clones, of the ‘unique’ 
region of mRNA F, the genome-sized mR NA. The sequence of approximately 8 kilobases from 
the 3’ end of the genome, containing the genes for the major structural polypeptides, has already 
been published (Boursnell & Brown, 1984; Boursnell et a/., 1984, 1985a, b; Binns et al., 19855). 
The 20500 bases of sequence reported here complete the sequence of the IBV genome, which is, 
as far as we are aware, the first complete sequence of a coronavirus and the largest RNA virus 
sequenced to date. 


METHODS 


cDNA cloning. Seventeen cDNA clones covering the 3’-most 27 569 kb of the genome have been obtained. These 
are shown in Fig. 1. They have been derived from RNA isolated from gradient-purified virus of the Beaudette 
strain (Beaudette & Hudson, 1937; Brown & Boursnell, 1984). cDNA has been obtained by three methods: 
oligo(dT) priming (Brown & Boursnell, 1984), priming with specific oligonucleotides (Boursnell et a/., 1984) and 
random priming with calf thymus DNA oligonucleotides (Binns et a/., 1985a). The Southern blotting technique 
was used to identify overlapping clones (Southern, 1975). Specific cDNA clones were identified using ‘prime-cut’ 
probes. These are made by synthesizing labelled DNA from selected M13 clones using the normal sequencing 
primer, cutting with a restriction enzyme, and eluting the labelled, single-stranded probe from denaturing 
acrylamide gels (Biggin et al., 1984). 

Subcloning for M13 sequencing. Random subclones of each cDNA clone were generated by sonication 
(Deininger, 1983) and subcloning into Smal-cut, phosphatase-treated ML3mp10 (Amersham). Bacterial colonies 
containing M13 with inserts were grown, transferred to nitrocellulose filters, and probed with nick-translated 
purified viral insert DNA from the cDNA clone. Single-stranded templates were prepared from M13 clones 
identified as viral in this way. 

DNA sequencing. Sequencing was carried out by the dideoxy method (Sanger et al., 1977; Bankier & Barrell, 
1983). [a-35S]dATP was used in the sequencing reactions and the products were analysed on buffer gradient gels 
(Biggin et al., 1983). Additional sequencing information was obtained by reverse sequencing (Hong, 1981). For 
regions containing compressions due to DNA secondary structure, sequencing samples were run on hot (80 °C) 
gels or gels containing 42% formamide. For some regions cytosine residues were modified by the method of 
Ambartsumyan & Mazo (1980) prior to separating on gels, to reduce GC base pairing. Deoxyinosine triphosphate 
(Bankier & Barrell, 1983) and deoxy-7-deazaguanosine triphosphate (Mizusawa et al., 1986) were used in place of 
deoxyguanosine triphosphate in some cases, again to reduce GC base pairing. For sequencing directly from the 
viral RNA the method used was essentially as described by Caton et al. (1982). 

Computer analysis of the sequence data. Sequence data were read directly into a BBC microcomputer using a 
sonic digitizer (Graf/Bar, Science Accessories Corporation) and data were analysed on a VAX 11/750 using the 
programs of Staden (19824, 5, 19842, b). Comparisons with the National Biomedical Research Foundation 
(NBRF) protein identification resource was made using the programs SEARCH and FASTP (George et al., 1986; 
Lipman & Pearson, 1985) and SEQHP (Kanehisa, 1982). 


RESULTS 
Selection of cDNA clones 


The majority of the cDNA clones which have been used to obtain the sequence of the ‘unique’ 
region of mRNA F were produced by a random priming method (Binns et a/., 1985). Clone 182 
was produced by priming with a specific oligonucleotide from existing sequence at the 5’ end of 
mRNA D. Clone 227 was identified as coming from the 5’ end of the genome by probing a 
random library with leader-specific probes. The randomly primed clones 217, 216, 204, 210, 205, 
220 and 249 were mapped by identifying overlaps using Southern blotting. The nine clones were 
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Fig. 1. Diagram showing the positions of all the cDNA clones used in obtaining the nucleotide 
sequence. The squares at the end of some of the clones show the positions of oligonucleotide primers 
used to prime synthesis of cDNA for adjacent clones. Above the clones are shown mRNAs A to F. 


not contiguous but formed four blocks. cDNA clones in the region of the three remaining gaps 
were obtained using specific oligonucleotide primers. Clones spanning the gaps were identified 
using either ‘prime-cut’ probes (Biggin et al., 1984) made from M13 subclones of cDNA clones 
on either side of the gap or by using Southern blotting. Five clones, 256, 263, BP3, BP5 and BP8 
were identified in this way and the overlaps confirmed by sequencing. Fig. 1 shows the positions 
of all the cDNA clones used in obtaining the complete sequence of the virus, and the positions of 
the oligonucleotide primers. 


DNA sequencing 


Fourteen cDNA clones have been sequenced to obtain the complete sequence of the ‘unique’ 
region of mRNA F, the genome-sized messenger RNA. The 20500 bases of sequence presented 
here stretch from the 5’ end of the genome to an arbitrary position 190 bases 3’-wards of the end 
of the body of mRNA E. The 39 nucleotides at the very 5’ end of the genome have not been 
obtained in cDNA clones from the Beaudette strain, and the sequence here is derived from 
Maxam & Gilbert (1980) sequencing of primer-extended products from Beaudette virion RNA 
(Brown et al., 1986). Fig. 2 shows the DNA sequence obained from the cDNA clones, with a 
translation in single-letter amino acid code of the main ORFs. 


Sequence analysis 


Fig. 3 shows the positions of ORFs in this region. Most of the sequence encodes two very large 
ORFs which could code for polypeptides of predicted molecular weights 441K and 300K. These 
two large ORFs have been designated F1 and F2. 

The first large ORF, F1, is not the first ORF to occur after the homology region. At position 
131 there is an AUG codon followed by a small ORF which could code for a polypeptide of 11 
amino acids. This AUG is the first initiation codon to occur on the genome. The second 
initiation codon is at the start of Fl. Both the large ORFs have a codon usage (Staden & 
McLachlan, 1982) very similar to that of the genes for the structural polypeptides S, M and N. 
The small ORF also appears to have the same codon usage, insofar as that is significant for such 
a short sequence. After the end of the small ORF the reading frame is open, in the other two 
possible frames, for a further 232 or 73 bases but the codon usage of the predicted amino acids 
for these sections of ORF is not similar to that previously found for IBV. The sequence context 
around the first AUG codon is not similar to that used by most eukaryotic mRNAs (Kozak, 
1983) in that it has a pyrimidine at position —3. The context around the second AUG on the 
other hand has a purine at — 3, in addition toa C at positions —1 and —4, both of which mean 
that it conforms well to the consensus for functional initiation codons. 
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ACTTAAGATAGATATTAATATATATCTAT TACACTAGCCT TGCGCTAGAT TTT TAACTT AACAAAACGGACTTAAATACCTACAGCTGGTCCTCATAGGT 


maAP GHLES GF CY * 
GTTCCATTGCAGTGCACTT TAGTGCCCTGGATGGCACCTGGCCACCTGTCAGGT TTT TGTTATTAAAATCTTATIGTIGCTGGTATCACTGCTIGTITIG 


CCGTGTCTCACTTTATACATCTGTTGCTTGGGCTACCTAGTGTCCAGCGICCT ACGGGCG ICG TGGCTGGT TCGAGTGCGAGGARCCTCTGGTICATCTA 
GCGGTAGGCGGGTGTGTGGAAGTAGCACT TCAGACGT ACCGGT TC TGTTGTGTGAAAT ACGGGGTCACCTCCCCCCACATACCTCTAAGGGLT TT TGAGC 
CTAGCGTTGGGCTACGT TC TCGCATAAGGTCGGCTATACGACGT T TGTAGGGGGT AG TGCCAAACAACCCC TGAGGTGACAGGTTCTGGTGGTGTT TAGT 


MAS S$ LK QGvS PK PROV ELV S KO TP 
GAGCAGACATACAATAGACAGTGACAACATGGCT TCAAGCC TAAAACAGGGAG TATC TCCCAAACCACGGGATGTCAT TCTTGTGTCCAAAGACATCCCT 


—EaqateceobdaAtL®ERF FY TS HN PK OY ADA F AVY RQK F ORSL AT 
GAACAACTTTGTGACGCTTTGTTTTICTATACGTCACATAACCCTAAGGAT T ACGCTGATGCT TT TGCAGT TAGGCAGAAGT TTGACCGTAGTCTCCAGA 


GkKQF K FET VOC GLFtetK & ¥ DK ITP GY PAK VY LK AT 
CTGGGAAACAGTTCAAATT TGAAACTGTGTGTGGTCICT TCCTCT TGAAGGGAGT TGACAAAATAACACC TGGCGICCCAGCAAAAG TT TAAAAGCCAC 


SK LA ODL EODTIF GY¥S PLARK Y RELL K TAC QW SLT Y 
TTCTAAGT TGGCAGAT T TAGAAGACATCT TTGGTGTCTCTCCTT TAGCGCGGAAGTACCGTGAAT TCT TGARAACAGCGTGTCAGTGGTCTCTTACTGTA 


—EAtLOVRAQTLOE TIF OPTETLWLeEQVAAK DCHV S SMA 
GAAGCACTGGATGT TCG TGCACAAACTC TCGATGAAAT TTT TGACCCCACTGAAATACT I TGGCT TCAGGTGGCTGCAAAAATTCATGTTTCATCTATGG 


MRRLWV GEV TAK VM OAL GSNLES ALF QAIVYK QaQtTAR 
CAATGCGCAGGCT TGT TGGAGAAGT AAC TGCAAAAGTCATGGATGCTCTGGGCTCAAACTTIGAGTGCTCT TTT TCAAAT TGT TAAACAACAAATAGCCAG 


IF Qk At ATF ENV NER PQRTIAALKM AP AK CAR ST 
AATCTTTCAAAAGGCACTGGC TAT TTT TGAGAATGTGAATGAATT ACCACAGCGTAT TGCAGCACT TAAGATGGCTTTTGCCAAGTGTGCTAGGTCAATT 


TVV VW ER TLWV K EF AG TOL AS INGA VY AK F F E EL P 
ACTGTTGTGGT IGT TGAAAGAACTCTAGT TGTTAAAGAGT TCGCAGGAACT TGTCT TGCAAGCAT TAATGGTGCTGTCGCAAAATICTT TGAAGAGT TGC 


N GFMmMGSK TF TTLA ER F K EA AY RV VENT PNA PRG T 
CAAACGGCTTCATGGGTTCTAAGAT TT TCACAACACT TGCCTTCTT TARAGAGGCAGC TG TGAGAGT TG TGGAGAACAT ACCAAA TGCACCGAGAGGTAC 


K GF EV VY GNAKGTQY¥VVVRGMRN ODL TLELLODAK AOI P 
TAAGGGAT TTGAAGT TGT TGGCAATGCCAAAGGCACACAGGTAGT TG TGCGCGGCATGCGAAATGACTTAACAT TGCT TGACCARAAAGCTGATATICCT 


V—eEPEGWSATLOGHLC§YWFRSGORF YAAPLSGNFEA 
GTTGAACCAGAAGGT TGGTCTGCAATT TIGGATGGACATCTTTGCTATGTCTT TAGGAGTGGTGATCGCTTTTATGCTGCACCTCTTTCAGGAAATITTG 


t S$ OV HCCERVV CLS DEVI PEITNOGLILAATY SS 
CTTTGAGTGATGTTCATTGCTGTGAGCGTGTAGTCTGTC TATCTGATGGTGTAACACCGGAGATAAATGATGGACTCATTCTAGCTGCAATCTACTCTTIC 


FSV S ELV TALK K GEPF KF LEGHK FV YAK OA AY S F 
TTTTAGTGTCTCTGAGCTTGTAACAGCTCT TAAAAAGGGTGAACCATTCAAGT TCT TGGGCCATAAAT TCGTGTATGCGAAGGATGCAGCAGTGTCTTTT 


TbLAK AAT OIADVWYLERLF QSARV ITAE DV WS S FTE K S F 
ACTTTAGCGAAGGCTGCCACTATTGCAGATGTCT TGAGGCTGT T TCAATCAGCTCCTGTGATAGCAGAAGATGTT TGGTCTTCATTTACTGAAAAGTCTT 


EF wk LAY GKVRNANLEE RF VK TYVY CK AQMSIVILAA 
TTGAATTCTGGAAGCT TGCATATGGAAAAGTGCGCAACCT TGAAGAATT TGTGAAGACCTATGTT TGTAAGGCTCAAATGTCGATIGTGATTCTAGCAGC 


VbLGEODTwWHtvs @v¥IYyk tGw¥tLF TK ¥ ¥ OF C DK HW K 
AGTGCT TGGAGAGGACATTTGGCATCTTGTCTCACAAGTCATCTATARAT TAGGIGTICTTTTTACTAAAGICGT IGACTITTGTGACAAACACTGGARA 


GrFevagdt«k RAKULIVTETF OCvVL KG VY A QHO PR OLE LE ODA 
GGTTTTTGTGTACAGT TGAAAAGAGCT AAGCTCAT TGTCACCGAAACCT TCTGTGTTT TARAAGGAGT TGCACAGCATTGTTTTCAACTGCTGCTAGATG 


IHS tYK SF K KCAL GRIHGORLLF WwW kK GoGGYHK IVaOD 
CAATACACTCTTTGTACAAGAGTTTTAAGAAGTGTGCACT TGGTAGAA TCCATGGAGATT TGCTCT TC TGGAAAGGAGGTGTGCATAAAAT TGTTCAAGA 


GoOEtTuwuFf OATOSVODVWVEDLGVYY Q@EKSITOF EV COO YV 
TGGCGATGAAATATGGTT TGACGCCATTGATAGTGT TGATGTTGAAGATC TGGGTGT TGT TCAGGAAAAATCEAT TGATTTTGAGGTT TGCGATGACGTG 


TLPENQPGHMVQTteEoOodDG& KN YM F F RF KK DENT Y ¥ T 
ACACT TCCAGAAAACCAACC TGGTCATATGGT TCAAATAGAGGATGATGGT AAGAACTACATGT ICT TCCGTTT TAAAAAGGATGAGAACATTTAT TATA 
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PmMSQOLGATITNVVCKAGGKTVIFGETTVYVQEITPPPOD 
CACCAATGTCTCAACTTGGTGCTAT TAATGTGGTT TGCARAGCAGGCGGTAAGACTGTCACC TT TGGAGARACTACAGTACAAGAGATACCACCACCTGA 


UVP TK VS TE €.0 6 ERP WN TTF KK AY KE PFE vp T OL 
TGTCGTGCC TATTAAGGT TAGCATAGAATGT TGTGGTGAACCATGGAATACGATCT TCAAGAAGGCT TATAAAGAGCETATAGAAGTAGATACAGACCTC 


TveEQtLtsSv¥tTIYEk MCoOODLKLF PEAP EPP PF ENV AL 
ACAGTAGAACAATTGCTCTCTGTGATCTA TGAGAAAA TGTGTGACGACCT TAAATTGTT TCCAGAGGCACCAGAGCCTCCACCATTTGAGAATGTCGCAC 


V OK N GKOLODCIK SCHLIYROYESODODODIEEEODAE 
TTGTTGATAAGAACGGTAAAGATTTGGATTGTATAAAATCT TGCCATT TGATCTATCGTGACTATGAGAGCGATGATGACATCGAGGAGGAAGATCCTGA 


—EcoOTODSGEAEECOTNSECEEEDE DTK YL AL TAD 
GGAGTGTGACACAGAC TCAGGTGAAGC TGAGGAGTGTGACACTAAT TCAGAATGTGAAGAAGAGGATGAGGATACTAAAGTGT TGGCTCTTATACAAGAC 


PAS IK YPLPtLODBEODYSWYNGEITYWYHKODALOVV NLP S 
CCGGCAAGTATTAAATACCCTCTGCCTCTTGATGAAGAT TATAGCGTCTATAATGGATGTAT TGTACACAAGGACGCTCTTGATGTTGTGAATITACCAT 


Go €€ TF VV NN CF EGAVKPLEPQGKY¥ V¥YOVL GOW GEA Y¥ 
CTGGTGAAGAAACTTTTGT TGTCAATAACTGTTT TGAGGGAGCTGT TAAACCACT TCCACAGAAGGTAGT TGATGTTCT TGGTGAC TGGGGAGAGGCTGT 


DAQEQtCQaadqgeptrt.QAaHTFEEPYVWENSTGESSK TM TE Q 
TGATGCGCAAGAACAACTG TGTCAACAAGAGCCTCTGCAACATACCTT TGAAGAACCAGTCGAAAATTCTACTGGTAGTTCTAAGACAATGACTGAACAA 


VvvVE—EOQqetPV¥VYVYEQOaqgovvvy YTtTPTOLCEVAK ETAEE 
GTCGTTGTAGAAGATCAAGAACT ACCTGT TGT TGAACAAGATCAGGATGTAGT TGTTTATACACCTACAGATCT TGAAGT TGCAARAGAAACAGCAGAAG 


V DEF ITLIF AVPKEEvVS QKODGAQTIKQEPTIQVYVK 
AGGTTGATGAGTTTATICTCATTTTTGCTGT TCCT AAAGAAGAAGT TGTG TCCCAGAAAGA TGGGGCACAGAT TARACAAGAGCCTATTCAAGT TGTTAA 


PQReEKK AK KFKVYKPATCOEK PK F LE YK TCOVWVGOLT 
ACCACAACGTGAGAAGAAGCC TAAAAAGT TCAAAGT TAAACCAGCCACATG TGAGAAACCTAAAT TTT TGGAGTATAAAACATGTGTGGGTGATT TGACT 


VvIAK ALOE FPF K EF CILVNAANEHMTHGSGVWAKATA 
GTTGTAATTGCCAAAGCAT TGGATGAGTTTAAAGAGTTCTGCATTGTAAATGC TGCAAATGAGCATATGACTCATGGT AGTGGCGT TGCAAAGGCAAT TG 


OF CGLOFVEYCEODYVK KHGPQARLYWYTPSFVK GY 
CAGACTTTTGTGGACTGGATTTTGTTGAATATTGTGAGGACTATGT TAAGAAACATGGGCCACAACAGAGACT TGTTACACCT TCGTT TGTCAAAGGCAT 


QcVvNNVVGPRHGONNLHEKULVAAYKNVLEVOGV Y 
TCAATGTGTGAATAATGT TGT AGGACCCCGCCATGGAGACAACAACT TGCATGAGAAGCT TGT TGC TGCCTACAAGAATGIGCT TGTAGATGGCGTAGTC 


NYVVPVWLStGtIFGY¥OFKMS TIT OAMRE AF EGETIRYV 
AATTATGTTGTGCCAGTTCTTTCATTAGGAATTTTTGGTGTAGATT TTAAAATGTCAATAGACGCAATGCGTGAAGCT TT TGAAGGT TGCACCATACGCG 


LtLFStsSaqeExHITDYF DVTCKAKTIYt tTeEodoGYK YRS 
TYCTTT TGTTTTCTC TGAGCCAAGAACACATCGATTATT TCGATGTAACT TGCAAACAGAAGACAATT TATCT TACGGAGGATGGTGTTAAATACCECTC 


Ivt kK PGOStGQaFGAaVYAKNK IVF TADOVEODKETI 
CATTGTTCTAAAACCTGGTGACTCAT TGGGTCAATT TGGACAGGT TTATGC TAAARACAAGATAGT TT TT ACAGCCGATGATGT TGAGGACAAAGAAATT 


LYvVPTTOKSTILEYYGLOAQKYVIYLQTLAQK WYN V 
CTCTACGTCCCCACGACTGATAAAAGCAT TCTTGAATACTATGGT T TAGATGCGCAAAAGTATGTAATATATT TGCAAACGCT TGCGCAGAAATGGAATG 


QYRONFLILEWROGNCECWITSSATVWLL AAA KOR FEF K 
TCCAATATAGGGACAATTTTCTTATAC TAGAGTGGCGCGATGGAAAT TCT TGGATT AGT TCAGCAATAGTTCTCCTTCAAGCTGCTAAAATTAGGTTTAA 


GFeTEAWAKLLGEBPTOF ¥AwWOY AS CTA K YY GOD F 
AGGTTTTCTAACAGAAGCGTGGGCT AAACTGTTAGGTGGAGATCCTACAGACTTTGTTGCCTGGTGTTATGCAAGT TGTACTGCTAAAGTAGGTGATT TC 


S$ DANWLLAN LA EHF OAD Y TNAFPFEL KK RV SCN CGTIK 
TCAGATGCTAATTGGCTTTTAGCGAAT TTAGCAGAACAT TT TGACGCAGATTACACAAATGCGTTTCTTAAGAAGCGCGTTTCGTGTAACTGTGGTATTA 


sYeE€tRGtLEACTIAQPY RATNLLKHEF KTQG@YS NK CP TCG 
AGAGCTATGAGCTTAGAGGCCTTGAAGCTTGTAT TCAGCCAGT TCGGGCAACTAATCTGCTACAT T TTAAGACGCAATAT TCAAAT TGCCCAACCTGTGG 


ANN TODEVIEASLtLPYLtLLF ATOGPATW OCDE DA YV 
CGCAAATAATACGGATGAAGTAATAGAAGCT TCGTTACCGTACTTATIGCT TTT TGCTACTGATGGTCCTGCTACAGT TGAT TGTGATGAAGATGCTGTG 


GTVVvVFV¥GS TNS GHCYTQAAGQAPF ONL AK ORK F GK 
GGGACTGTCGTGTTTGTTGGTICTACTAATAGTGGCCAT TGT TAT ACACAAGC TGCAGGGCAAGCTTTTGATAATCT TGCTAAAGATAGAARATTTGGAA 
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K SPY ITAM Y TRF APF KN ET SULEPWVYAK ASK GK SKS V 
AGAAGTCGCCTTACATTACTGCAATGTAT ACGCGAT TCGC TTT TAAGAATGARACCTCTT TGCCTGT TGCT ARACAGAGCAAGGGTAAGTCTAAGTCGGT 


KEODVS NLATS SK AS F ONL TOF EQWYODSNIYESL 
AAAGGAAGATGTTTCTAACCTTGCTACTAGT TCTAAGGCCAGTTT TGATAATCTTACTGACT TCGAACAGTGGTATGATAGTAACATCTATGAAAGTCTT 


KV Q@eES P ON F OK Y¥V SF TTKEOS KLPLTLELK VY RGTIRK S 
AAAGTGCAGGAATCACC TGATAACTT TGATAAATATGIGTCAT TCACAACAAAGGAAGATTCTAAGT TGCCAT TGACACTTAAGGT TAGAGGTATTAAAT 


Vv ODF R SK OGFIYKULTPDTOENSKAPY YY PVWLOA 
CAGTTGTTGACTT TAGATCGAAGGATGGTTTTATTTATAAGTTAACACCTGATACTGATGAARAT TCAAAAGCACCAGTCTACT ACCCAGTCT TGGACGC 


Ist KATWWEGNANFVVGHPNYYSK StLHIPTFWE 
TATTAGTCTTAAGGCAATATGGGTGGAAGGTAATGCTAACTTTGTTGTTGGTCATCCAAATTATTATAGTAAGTCTCTICATATTCCTACTTTTTGGGAA 


NA ENF VK MGODKIGGY¥TtTmMGtLEWRAEHLNKPNLERIF 
AATGCTGAGAATTTTGTTAAAATGGGTGATAAAAT TGGTGGTGTAACTATGGGACTT TGGCG TGCAGAACACCT TAATAAACCTAATT TGGAGAGAATTT 


NIAK KATY GSSY¥Y¥TTAetCGcKkKtIGk AAT EF IAD K YG 
TCAACATTGCTAAGAAAGCCAT TCT TGGATCTAGTGTTGTTACTACACAATGCGGTAAAT TAATAGGTAAAGCAGCTACATICAT TGC TGATAAAGT AGG 


GGvvRNITOSIKGtLCGITRGHFERKMSPQEFE LK T 
TGGTGGTGTAGT TCGCAATAT TACAGATAGCAT TAAGGGTCTTTGTGGAAT TACACGAGGGCAT TT TGAAAGAAAAATGTCTCCACAAT TCCTARAGACG 


LMF FLF Y FEKASVY¥KSVV AS YK TVYLEOK VV EL ATOELI 
CTTATGTICTITTTATTCTATTTCT TGAAGGCTAGTGTTAAGAGTGT TGTCGCTAGCTATAAGACCGTGTTATGTAAGGTGGTACTTGCTACTTTACTTA 


VuUF VY TSNPYWYOMF TGIRVLOFLFEGSLCGPYKODOY 
TAGTTTGGTTTGTCTACACAAGTAACCCAGTAATGTT TACAGGAATACGTGTGTTAGATTTTCTATTCGAGGGTICT TTGTGTGGTCCT TAT AAAGACTA 


GKODBSFODVLRYCADOF ITCRVCLHODBKDSLHLYKHA 
TGGTAAAGATTCTTTTGATGTGTTACGATATTGTGCAGATGATTTTATT TGTCGTGTGTGTT TACATGACAAAGAT TCACTTCATT TGTACAAACACGCT 


YSvEQVYKOAASGFITIFNWNWLEYLVFOELILFYKPYWA 
TATAGTGTAGAGCAGGTCTATAAAGATGCAGCTTCTGGTTTTATTTTTAAT TGGAATTGGCT TTATTTGGTCTTICTAATATTATITGTTAAACCAGTGG 


GFVIItICYCwkKYLVLNSTVLQATGVEOCFLOWFVQATY 
CAGGTTTTGTTATTATTTGCTATTGTGTTAAGTATT TGGTATTGAAT TCAACTGTGCTGCAAACTGGTGTTTGTITTTTAGATTGGTTTGTACAAACAGT 


FSHFNFMGAGFyYFWLELF YKIYIQVHHTILYCK OV T 
TTTTAGTCACTTTAATTTTATGGGAGCAGGGTTTTATTTCTGGCTCTTTTACAAGATATATATACAGGTGCATCATATACTGTATTGTAAGGATGTAACA 


cEvCK RVARSNRQEVSV¥VVVGGRKQIVHVYTNS GY 
TGTGAAGTGTGCAAAAGGGT TGCACGCAGCAACAGGCAAGAGGT TAGCGTGGT TGT TGGTGGACGCAAGCAGATAGTGCATGTTTACACTAACTCTGGCT 


NF CK RHNWYCRNCKCDOYGHANTFMSPEVWVAGELSE 
ATAACTTTTGTAAGAGACATAAT TGGTATTGTAGAAAT TGTGATGAT TATGGTCACCAAAATACATT TATGTCTCCTGAAGT TGC TGGCGAGCTCTCTGA 


K LK RHVKPTAYAYHVVYOEACLVODFRF VNLK YK AA 
AAAGCT TAAGCGCCATGT TAAACC TACAGCAT ACGCTTACCACGT TGTGGATGAGGCATGCTTAGTTGATGATTTTGTCAATTTAAAATATAAAGCTGCA 


TPGKODS AS SAVK CPF S¥VTODOF LK KAWVFLK EAL K CEQ 
ACTCCTGGTAAGGATAGTGCATCTTCAGCTGTTAAGTGTT TCAGTGTTACAGATT TCT TGAAGAAAGCTGTTTT TCT TAAGGAAGCAC TGAAATGTGAAC 


IsnoO GF IVCN TAS AHALEEAKNAATY YAR YL CE K 
AAATATCTAATGATGGTTTTATAGTGTGTAAT ACACAGAGTGCTCATGCAT TAGAGGAAGCAAAGAATGCAGCCATCTATTATGCGCAATATCTGTGTAA 


PTLrtetoqgqgatrt_YEqtvy¥EPnAPY SK S¥TOKYVCSTtLSS tT 
GCCAATACTTATACT TGACCAGGCACTTTATGAGCAAT TAGTAGTAGAGCCTGTGTCTAAGAGTGT TATAGATAAAGTGIGTAGCATTTTGICTAGTATA 


IsvDTAALNYKAGTLROBDOALLSITKOEE AV OMA T EF 
ATATCTGTAGATACTGCAGCTTTAAAT TAT AAGGCAGGCACACT TCGTGATGCICTGCTTTCTAT TACT ARAGACGAAGAGGCCGTAGATATGGLTATAT 


C HN HOY 8 Y 7G OG FTN TP SY GOT eT eK LTP OR DOR G 
TCTGTCATAATCATGATGTGGAT TACACTGGTGATGGTTTTACTAATGTGATACCGTCATATGGTATAGACAC TGGCAAGT TAACACCTCGTGATAGAGG 


FLINAODAS TIT ANLRVKNAPPY VW KFS ELTK ESOS 
GTTTTTGATAAATGCAGATGCTTCTAT TGCTAACT TAAGAGT TAAAAATGC TCCGCCGGTAGTATGGAAGT TT TCTGAGCT TAT TAAGT TGTCTGACAGT 


ect.K YLISATVWK SGBVWRF FI TK SGAKQVIATH TQKL 
TGTCTTAAATATTTAATT TCGGCTACTGTTAAGTCAGGTGT TCGTTTCTT TATAACAAAGTCTGGTGCTAAACAAGT TAT TGCTTGTCATACACAGAAGT 
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Coronavirus IBV sequence completed 


tL ve K K AGGtTI*v¥sS GTF K CF KS YF KWL LIF Y TLE TA 
TGTTAGTAGAGAAARAGGCAGGTGGTATTGTTAGCGGCACCTTTAAGTGTTTTAAGAGT FATT TTAAATGGCTCT TGATCTTTTACATACTT TT TACAGC 


ccsGyyYyyYm@evS KS F VHP Mm Y OVNS TLHVEGFK YI 
ATGTTGTICGGGI TATTACTATATGGAGGTGAGTAAAAGTIT IGT ICACCCCATGTATGATGT AAACTCCACAC TGCATGTIGAAGSTITTAAAGTTATA 


Ok GWtLRETIvVPEOD TC FF SNK F VN F OA F UW GRP Y DN SR 
GATAAAGGTGTTCT TAGGGAAAT TGT ACCAGAAGATACATGTTTCTCTAATAAATIIGTTAAT TT TGATGCTTTT TGGGGCAGACCATATGATAATAGTA 


NCPIVTAYIOGOGTYATGCY PGF ¥Y Su vmoGY MEF YT 
GAAACTGTCCAAT TGTCACAGCTGT TATAGATGGTGATGGGACAGT AGCTACAGGTGT TCCTGGT TT TGTGTCCTGGGTTATGGATGGTGTTATGTITAT 


HMTQTERK PW Y TPT Ww F NRE TV GY TQO0OSTITEG S 
ACATATGACACAGAC TGAGAGAAAACCGIGGTACAT TCCTACTIGGTT TAATAGAGAAAT TGTCGGTTACACTCAGGAT TCAATTATTACTGAGGGTAGT 


FYytTts fALFSARCLYLTASNTPQAL YCFFNGON DAP G 
TTTTATACATCTATAGCGTTATTT TCCGCTAGGTGTTTATATT TAACAGCCAGCAATACACCTCAATTGTAT TGCTT TAATGGTGATAATGATGCACC TG 


AL PFGSTII$§IPHRV YF QPNGYVYRLIVPAQILHTPY V 
GGGCTTTGCCATT TGGTAGTATTATTCCTCATAGAGTT TAT T TCCAACCCAATGGTGTTAGGCT TATAGT TCCACAACAAATAC TGCACACACCLTACGT 


VK FW SDS YCRGSVCEYTRPGYCVWStLNPQwY VE FN 
AGTAAAGTTTGTATCAGACAGCTATTGTAGGGGTAGTGTGTGTGAGTACACTAGACCAGGT TACTGTGTGTCAT TAAACCCACAATGGGTTTTGTTTAAT 


oe YTS KPGY¥FCESTVREULMEF SMY¥ STF FT GY NPNYI 
GACGAATACACAAGTAAACCCGGTIGTTT TCTGTGGTTCTACTGT TAGAGAACTTATGTT TAGTATGGTTAGTACAT TCTTTACTGGTGT TAACCCCAATA 


YmaQteuaATM FF LELILVY WV VL TF AMY TK FF QG VF KAY AT 
TCTATATGCAATTAGCAACTATGT TTT TAATACTAGT IGT IGTTGTATTAATCTT TGCAATGGT TATAAAGTT TCAAGGTGTTTTTAAAGCTTATGCAAC 


TVF ITMtLEWwW VY IN AF Ttb Ov HS YNSVLAWTtLEVEL Y 
CACTGTTTTTATAACAATGTTAGTTTGGGTAATTAACGCATTTATTTTGTGTGTACATAGTTACAACAGTGT TT TAGCTGTTATATTACTAGTACTCTAT 


cYAStvVTSRN TV IIMHCwWLVF TF GkLIVPTwW EAC 
TGCTATGCGTCATTGGT TACAAGTCGCAATACTGTTATAATAATGCAT IGT TGGCTIGTTTTTACCTTIGGT TT AATAGTACCCACATGGTIGGCTIGTT 


YuteGFtTItrymytPLFLWCYECTTtKNTRKLY OGNE F V 
GCTACCTGGGATTTATTATTTATATGTATACACCGTTGTTTTTATGGTGT TATGGT AC TACAAAAAACACTCGTAAGCTGTATGATGGCAATGAGTTTGT 


GN YDBDtLAAK S FTF ¥ERGS EF VK LTNE TG OK FEA YL 
TGGTAATTATGATCT TGC TGCGAAGAGCACTT T TGTTAT TCGCGGC TCTGAATT TGTTAAGCT TACTAATGAGATAGGTGATAAATTTGAGGCCTACCTT 


SAYARULKYYSGTGSEQODyYLQ@ACRAWL AY AL OQ YR 
TCAGCGTATGCTAGATTAAAGTACTATTCAGGCAC TGGCAGTGAACAAGATTATT TGCAAGCT TGTCGTGCATGGTTAGCTTATGCTTTGGACCAATATA 


NSGVETIVYTPPRYSITGwv¥SRLQAQSGFKKEVSPSSA 
GAAATAGTGGTGTGGAAATTGTT TATAC TCCGCCACGTTACTCTAT TGGTGT TAGTAGAT TACAATCTGGTTTTAAGAAACTGGT TI TCTCCTAGTAGTGC 


VEK CI vVSVS YRGNNLNGLwWtLEGOTI YY CPRHVL GK 
TGTTGAAAAGTGCATTGTTAGTGTCTCTTATAGAGGTAATAATCT TAATGGACTGTGGCTAGGTGACACTATCTACTGTCCTCGTCATGTAT TGGGTAAG 


FSGDQwNOVWVLNLANNHE FEVTTAHGY TLELNVV SRR 
TTTTCAGGTGACCAATGGAATGATGTACTTAATCTTGCTAATAATCATGAGT T TGAAGTT ACAACTCAACATGGTGTTACTT TGAATGT TGTCAGTAGGC 


LCKGAWVYLELItCaAGTAVY AN AE TPR K Y KF IK ANC GOS FTE 
GTTTAAAAGGTGCAGT TT TAATT TTACAAAC TGCTGTTGCTAATGCTGAAAC TCCAAAGTATAAGTTTATTAAAGCTAAT TGTGGTGATAGTTTCACTAT 


ACAY GGTV¥VGLYPVTIMRSNGTIRAS FLAGACES 
AGCTTGTGCTTATGGTGGTACAGT TGTAGGACTCTACCCTGTTACTATGCGTTCTAATGGTACTATTAGAGCATCTTTTCT TGCGGGAGCCTGTGGTICA 


V GFN TEK GVW N FF YM HHLELPNALHTGTOLMG EF 
GTTGGTTTTAATATAGAAAAGGG TGTAGTTAATTTCTTTTATATGCACCATCTTGAGT TACCTAATGCAT TACACAC TGGAACTGACCTAATGGGTGAAT 


YG6EYVODEEVAQRVPPONLYVYTNNIWVA WEY AATIOS 
TCTATGGTGGTTATGT TGATGAAGAGGT TGCACAAAGAGTGCCACCAGATAATT TAGTTACTAACAATAT IGT AGCATGGCTCTATGCGGCAATTATTAG 


VK ES SF SLPKWLESTTVYSVODOYNK WAGON GE TP 
TGTTAAGGAGAGTAGTTTCTCGCTGCCTAAATGGT TGGAGAGTACTACTGTTAGTGT TGATGAT TATAATAAGTGGGCTGGTGACAATGGTTTTACACCA 


FS TS TATTK LSATTGCVOVCECK LULRTITIMVY KN S QW GG 
TTTTCTACTAGTACCGCTATTACTAAAT TAAGTGCTATAACTGGAGT TGATGTT TGTAAGCTCCT TCGCACTAT TATGGTAAAAAAT AGCCAGTGGGGTG 
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M. E. G. BOURSNELL AND OTHERS 


OPILGQyYNF EODELTPE SV FN QIGGEVRLQSSFYR 
GTGACCCCATTTTAGGGCAATATAATTT TGAAGATGAAT TGACACCGGAGTCTGTATT TAATCAGAT TGGTGGTGTTAGATTACAATCTICTITTGTAAG 


K ATS WwW Fw SR OWL AC Pr ORrF WLC AT VLFT AY PL KF OY 
AARAGCTACATCTIGGTITIGGAGTAGAIGTGTGTTAGCT TGCTTICTIATTISTGIISTGIGCTAT IGTCTIGT I TACGGCAGTGCCACTTAAATTTTAT 


VYAAVILLMAWLEF IS FTV KHVMmMAYM OTF LL PTL I 
GTATATGCAGCTGTTATTTTGTTAATGGCTGTACTITTTATTICTITTACTGT TAAACATGT SATGGCATATATGGATACTITTCTAT TGCCAACATIGA 


Tv IIocgvwvcaEvePF IYNTLISAQVVIF LSQwWYodDPY VY 
TTACAGTTATTATTGGAGTTTGTGCTGAAGTGCCTTTCATCTACAATACTCTAAT TAGTCAAGT TGTTATTTTCTTAAGTCAATGGTATGACCCAGTAGT 


FOTM vy PW MF LPLVWLY TAF KOVAGCYMNSFNTSEL 
CTTTGATACTATGGT ACCATGGATGTTCTTGCCACTAGTGT TGTATACTGCTTTTAAGTGTGTACAAGGTIGCTATATGAAT TCT TICAATACTICTTIG 


cLmeY QF VK LGFVYIYTSSNTLTAYTEGNW ELF FEL 
TYAATGCTGTATCAGTT TGTGAAGTTAGGTTTTGTTATTTACACCTCTICTAATACTCT IAC TGCATACACAGAAGGTAAT TGGGAGTTATTCTTCGAGT 


VH TT Vt ANY SSN SLIGLEF ¥Y FF KCAK WME Y YON A T 
TGGTGCACACTACTGTGT TGGCTAATGTTAGTAGTAATTCTTTAATIGGT TTATTTGTT TT TAAGTGIGCTAAATGCATGTIGTAT TAT TGTAATGCAAC 


YUNNYVLMAV®E VN CTIGwteeTtTCYF GLYWUwWU VY NK VF 
ATACTTAAACAAT TATGTACTAATGGCAGT TATGGTTAACTGCAT TGGCTGGC TC TGCACT TGTTACTT TGGGT TGTAT TGGTGGGT TAATAAGGTTTTT 


GteTttGKk YNF KYV¥SV¥DQyYRYMCLHK INPPK TV WE V F 
GGTTTAACCTTAGGTAAATACAATTTTAAAGTTTCAGTAGATCAATATAGGTATATGIGT TI TGCACAAGATAAACCCACCTAAAACTGTGTGGGAAGTCT 


STN ITtrTraGgtrirGecorRrvtPTIatTtTvwvaQaktsoOvkK CTTV VY 
TTTCGACAAATATACT TATACAAGGAAT TGGTGGTGACCGTGTGT TGCCTAT TGCTACAGT TCAAGC TAAAT TGAGTGATGTAAAGTGTACAACTGTTGT 


LmaQgtuetTKLELNV EAN SK MHVY EVE LHN KIL AS OO YW 
TTTAATGCAGCT YT TGACTAAGCTTAATGT TGAAGCAAAT TCAAAAATGCATGTI TATCT IGT TGAGT TACACAATAAAAT TCT IGCTTCTGATGATGTT 


GecmonttGmtertt+tererostraotseycecoo4s«#etKRs 
GGAGAGTGCATGGATAATT TGTTGGGTATGCTTATAACACTATTTTGTATAGAT TCTACTATTGATTTGAGTGAGTAT TGTGATGACATACT TAAGAGGT 


TvVbLQgsvtTtaqerQrsuHtIPpSsSYAEYERAKNLYEKVLEV OS 
CAACTGTATTACAATCGGTTACTCAAGAAT TCTCACATATACCCTCTTATGCTGAATATGAARAGGGCTAAGAATCTTTATGAAAAGGT TTTAGTTGATTC 


KN GGVYTAQEtLAAYRKAANTIAK SV F ORODL AV AK K 
TAAAAATGGTGGTGT TACACAGCAAGAGCT TGC TGCATATCGTAAAGCTGCCAATAT TGCAAAGTCAGTTTTTGATAGAGACT TGGCTGTCCAAAAGAAG 


LOSMAERAMTTMYKEARVTORRAKLVSSLHALL F 
TTAGATAGCATGGCAGAGCGTGCTATGACAACAATGTATAAAGAGGCGCGTGTAACAGATAGACGAGCAAAAT TAGTCTCATCACTACATGCGTTACTIT 


S MLK K fOS EK LNVLFOQaAS S$ GY¥YPLATW PITY CS 
TCTCAATGCTTAAGAAAATAGATTCTGAAAAGCT TAATGTCT TGTT TGACCAGGCTAGTAGTGGTGT TGTGCCCCTAGCGACTGT TCCAATTGTTTGTAG 


NKLELTLC¥ TePOoOPE THU ¥Y K OV EGY AY TY STV VY WN TOT 
TAATAAGCTTACACT TGTAATACCAGACCCAGAAACGTGGCTCAAGTE TG TGGAAGGTGTGCATGTTACATAT TCAACAGT TGTT TGGAATATAGACACT 


VI OAODGTEtLHPTSTESEtLTFTYCISGANTAW PL KV N 
GTTATTGATGCCGATGGCACAGAGT TACACCCAACTTCTACAGGTAGTGGAT TGACATACTGTATAAGTGGTGCTAATATAGCATGGCCTTTAAAGGTTA 


tL TRNGHN KV¥OVVLELQNNELMPH GY K TK AC WA GV OD 
ACTTGACTAGGAATGGGCATAATAAGGT TGATGT TGTTT TGCAAAATAATGAGCT TATGCCACATGGTGT TAAAACAAAGGCT TGCGTAGCAGGTGTAGA 


QAHCSV¥ESKCYYTNISGNSVWVVAATTSSNPNL K YV 
TCAAGCACATTGTAGCGTAGAGTCTAAATGTTATTATACAAATAT TAGTGGCAAT TCAGT TGTAGCTGCTATTACTTCT TCAAATCCAAATCTGAAAGTA 


AS FULNEAGNQTYVOLCOBOBRPPCKFGCOMKYVGVWVKVEVV YL 
GCTTCGTTTT TGAATGAGGCAGGCAATCAGAT TTATGTAGACTT AGACCCACCATGTAAATT TGGCATGAAAGTGGGTGTCAAGGT TGAGGT TGTT TACT 


YF OK NTRS ITI VROGOMYLGARISNY ¥Y ¥LQSKGHETEE 
TGTATTTTATAAAGAATACAAGGTCGAT TGTTAGGGGTATGGTACT TGGTGCTATATCTAATGT TGTTGTCT TACAGTC TAAAGGGCATGAAACAGAGGA 


VDAVGITItStesFAVDPADTYCKYVAAGNQPLEN 
AGTGGATGCTGTTGGCATTCTTTCACTATGTICATT TGCAGTAGATCCCGCGGACACATAT TGTAAATATGTGGCAGCAGGTAATCAACCTT TAGE TAAC 


cvKMLTVHNGSGFATTSKPSPTPDQOSYGBGBAS VE 
TGTGTTAAAATGT TGACAGTGCATAATGGTAGTGGTTTTGCTATAACT TCAAAGCCAAGTCCTAC TCCTGACCAGGAT TCT TATGGAGGAGCT TCTGTCT 
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Coronavirus IBV sequence completed 


L YC RAHIAHPGSVGNLOGRCQFKGSFYVQIPTTE 
GTCTCTATTGTAGAGCACACATAGCACA TCCAGGAAGTGT AGGAAAT T TAGATGGACGTTGTCAATTTAAAGGT TCTTTTGTGCAAATACCTACTACGGA 


K OPVGFCLRNKVCTV¥YCQCwrlrscyGetaqckostRapP tk 
GAAAGACCCCGTTGGAT TCTGTCTACGTAATAAGGT TTGCACTGTT TGCCAGTGTTGGAT TGGT TATGGATGTCAGTGTGAT TCACT TAGACAACCARAA 


§SvQSVvVAGASO0F DK NYLNGYGYAVRLG * 
TCTTCTGTTCAATCAGT TGC TGGAGCATCTGAT TT TGATAAGAATTATT TAAACGGGT ACGGGGT AGCAGTGAGGC TCGGC TGATACCCCT TGCTAGTGG 


MF QNLtLKRNACARFEFQE 
ATGTGATCCTGATGT TGTAAAGCGAGCCTT TGATGTT TGTAATAAGGAATCAGC TGGTATGTTTCAAAATT TGAAGCGTAACTGCGCTAGAT TCCAGGAA 


LROTEOGNL_EYLOS YF V¥KQtTTPSNYEHEKSECYE 
CTACGCGATACTGAAGATGGAAATCT TGAGTATCTTGATICTTACTT TGTAGT TAAACAAACCAC TCCTAGTAATTATGAACATGAAAAATCT IGT TACG 


OLKS Ev TtTAOH OF FV F NK NIYNISRQRLETKYTAMM 
AAGACTTAAAGTCAGAAGTAACAGCTGACCATGACTTCTTTGTGT TCAATAAGAACATTTACAATATTAGTAGGCAACGGCTTACTAAATATACTATGAT 


OF CYALRH F OPK BDECEVLKETLvVIyYGeteéeoyY HP K 
GGACTTCTGCTATGCTT TGAGACAT T TCGACCCAAAGGAT TGTGAAGTTCT TAAAGAAATACT TGTCACTTATGGTTGTATAGAAGACTATCACCCTAAG 


w FEENK OW YOPTEN SK YYVWVMLAKMGPIVRRALLN 
TGGTTTGAGGAGAATAAGGAT TGGTACGACCCAAT AGAAAACTCAAAATAT TATGTCATGT TGGCTAAAATGGGACCTATTGTACGACGTGCTTTATIGA 


ATEF GNLELMV EK GYVGVITLODNQOLNGK F Y OF GOD 
ATGCTAT TGAGT TCGGAAACCTTATGGT TGAAAAAGGT TATGTTGGTGTTATTACACTCGATAACCAAGACCT TAATGGCAAATTTTATGATTTTGGTGA 


FQqKk TAPGAGVPVF DTYYSYMMPTITAMTIOALAPE 
TTTTCAGAAGACGGCACCTGGTGCTGGTGTTCCTGTITTTGATACGTATTATTCTTACATGATGCCCATCATAGCCATGACGGATGCTT TAGCACCTGAG 


RY F EYOVHKGYK SYOtLLKYOYTEEKQEtF QK Y F K 
AGGTACTTTGAATATGATGTGCACAAGGGT TATAAATCTTATGATCTCCTCAAGTATGATTATAC TGAGGAGAAACAAGAAT TGTTTCAGAAGTACTTTA 


YwoQeEyYHPNCROBDCSOODRCLIHCANFNILFSTELI 
AGTACTGGGATCAAGAGTATCATCCTAACTGCCGTGACTGTAGTGATGACAGGTIGTTTGATACATTGTGCAAACTICAATATCTIGITTTCTACACTTAT 


PatTtTSFGNLCRKVFYVYOGYPFIATCGYHSKELGYI 
ACCGCAGACTTCTTTCGGTAATTTGTGTAGAAAAGTTTTTGT TGATGGTGTACCATT TATAGCTACT TGTGGCTATCATTCTAAGGAACT TGGTGTTATT 


mNQONTMSFSKMBLSQtLMQFVGDPALLVETSNNL 
ATGAATCAAGATAACACCA TGTCT TTT TCAAAAATGGGTT TAAGTCAACTCATGCAGT I TGT TGGAGATCCIGCTTTGT TAGTGGGAACATCCAATAATT 


VDLRTSCRrF SVCALTSGCITHATYKPGHF NK OF Y OD 
TAGTTGATCTTAGAACGTCTTGTTT TAGTGT TTGTGCGT TAACATCTGGTAT TACTCATCAAACGGTAAAGCCAGGTCACTTTAACAAGGATTTCTATGA 


FAEKAGMFKEGSSIPLKHF FYPQTGNAATINOD YO 
TTTTGCAGAGAAGGCTGGTATGTTTAAGGAGGGT TCGTCTATACCACTTAAACATTTTTICTATCCTCAAACTGGTAATGCTGCTATAAACGATTATGAT 


YYRYN RPTMF OTCQtLELLFOLEVTISKYFECYEGECI 
TATTATCGTTATAACAGGCCTACCATGTTTGACATATGTCAACTTCTATTTTGTTTAGAAGTGACT TC TAAATACTT TGAGTGT TATGAAGGCGGCTGTA 


PASQVVVNNLODKSAGYPFNKF GOK ARLYYEMSLE 
TACCAGCTAGCCAAGT TGTAGT TAACAACTTAGATAAGAGTGCAGGCTATCCATT TAATAAGTT TGGAAAAGCCCGCCTCTATTATGAAATGAGTCTAGA 


EQoqaq@trFrFeEtTtItTtTkKKNVLPTITQMNLKYALTSAKNRAR 
GGAACAGGACCAACTCT TCGAGATTACGAAGAAGAATGTCC TACCCACTATAACTCAAATGAATT TAAAATATGCCATATCCGCGAAAAATAGAGCGCGT 


TVAGYSTILSTMTINRQFHAKILKSIVNTRNASYYT 
ACAGTGGCAGGTGTGTCTATCCTTTCTACTATGACTAATAGGCAGTT TCATCAGAAGATTCTTAAGTCTATAGTCAACACTAGAAATGCTTCTGTAGTTA 


GTTtTK F YGGWONMLRNLIQGVWVEODPILMUGWOYPKE 
TTGGAACAACCAAGTTTTATGGCGGT TGGGACAACATGT TGAGAAACCTGATTCAGGGTGT TGAAGACCCAAT TCT TATGGGT TGGGATTATCCTAAGTG 


DRAMPNLLRIAASLVLARKHTNCCSHWSERIVY RE 
TGATAGAGCAATGCCTAATTTGTTGCGTATAGCAGCATCCT TAGTACT TGC TCGCAAACACACTAACTGTIGTAGT TGGTCTGAACGCATTTATAGGTTG 


YNECAQVLESETVLATGEGCIYvVKPGCGCTSSEDATTA Y 
TATAATGAATGCGCCCAGGTCT TATCTGAAACTGTACT TGC TACAGGTGGTATTTATGT TAAACC TGGTGGCACTAGCAGTGGTGATGCTACTACTGCTT 


AN SV FNIIQATSANWVWVARLCULSVITROIVYONTIK S 
ATGCAAACAGTGTTTTTAACATAATACAAGCCACATCTGCTAATGT TGCGCGTCTTTTGAGTGTTATAACGCGTGATATTGTCTATGATAATAT TAAGAG 
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M. E. G. BOURSNELL AND OTHERS 


LQ@yYyeEetvyaaqvyYRRvVN FDODPAF VE K FY S YULCK NF SEL 
CTTGCAGTATGAATTGTATCAGCAGG TC TACAGGCGAGT TAATTT TGACCCAGCCTT TGT TGAAAAGTTTTATTCTTACTTATGTAAGAAT ITT TCCTIGC 


mIbLSoOdDODGY¥Yv¥CYNNTLAKQGtLVADIS GF REvL Y ¥Y Q 
ATGATCT TGTCTGACGACGGTGT IGTTTGTTACAACAACACAT TAGCCAAACAAGGTCTIGTAGCAGATATTTCTGGTTT TAGAGAGGTTCTCTACTATC 


NN VF MAODS KCOCWVEPOLEKGPHE FCS QHTMLWVE WV 
AGAATAATGTTTTTATGGCTGATTCTAAATGT TGGGT TGAACCAGATT TAGAAAAAGGCCCACATGAGTTTTGTTCACAACACACAATGCTAGTGGAGGT 


OGEPKYtPYPOPSRILGACKCYHWFRFVOODOVOK TEPYVA V 
TGATGGTGACCCTAAGTATT TGCCATACCCAGACCCT TCACGCATTTTGGGTGCATGTGTTTT TGTAGATGACG TGGATAAGACAGAACCTGTGGCTGTT 


me RY TALATIOAYPL VY HHENE EF YK K VF FW LL AY TR 
ATGGAGCGTTATATAGCTCT TGCCATAGATGCT TATCCACTAGTACATCATGAAAATGAAGAGTACAAGAAGGTATICTTIGT ICTCCTTGCATATATCA 


Ko ¥ @ £05 QAM CM OY S FV MW BOYD KG SK OF UD EQ EOF Y € 
GAAAACTCTATCAAGAGCTTTCTCAGAATATGCT TATGGACTACTCTIT TGTAATGGATAT AGACAAGGGTAGTAAAT TT TGGGAACAGGAGT TCTATGA 


NMYRAPTTLCASCGYCV¥VCNSQATILRCGNEIR KP 
GAATATGTATAGAGC TCCTACGACTTTACAATCT TGTGGCGTT TGTGTAGTT TG TAATAGTCAAACTATACT ACGCTGCGGTAATTGTATTCGTAAACCG 


FoeCCK CC YOHVMHTOHKNVLESTINPYTCSALGBCOGE 


TITTTGTGTTGTAAGTGTTGCTATGACCACGTCATGCATACGGACCACAAAAATGI TTTATCTATAAATCCTTATATTIGCTCACAGCTAGGTTGCGGTG 


ADVTKtLYtGGmSsS YF CGNHKPKLSIPLVSN GT VF 
AAGCAGATGTTACTAAAT TGTACC TCGGGGGTATGTCGTACTTCTGTGGTAATCATAAACCGAAAT TGTCAATACCGTTAGTATCTAATGGTACTGTITT 


GTYRANCAGSENVOODBF NQLATTINWSTVWYEPYILA 
TGGAATTTACAGGGCTAAT TGTGCTGGTAGTGAAAATGT TGATGATTT TAATCAACTAGCTACTACTAAT TGGTCCAT TGTCGAACCTTATATTTTAGCA 


NRCSOStLRRFAAETVKATEELCHKQAQFASAEVRE V 
AATCGCTGTAGTGATTCAT TGAGACGT TTTGC TGCAGAGACAGT ARAAGCCACAGAAGAAT TACAT AAGCAACAAT T TGCTAGTGCAGAAGTGCGAGAAG 


FS OREULCTtCSWEPGKTRPPLNRNY VF TG Y HEF TR T 
TATTCTCAGATCGTGAATTGATTCTATCATGGGAACCAGGAAAAACCAGGCCGCCAT TGAATAGAAATTATGTTTTCACAGGTTATCACTT TACAAGAAC 


S$ KV Q@tGOF TFEKGEGKOVVYYKATSTAKL SV GO 
TAGTAAGGTGCAGCT TGGTGATTTTACATTTGAAAAAGGTGAAGGTAAGGATGT TGTCTAT TATAAAGCAACGTCTACTGCTAAAT TGTCTGTAGGAGAC 


IF vVtLTSHNVVSLEVAPTLELCOCPQATF SRF VNLRPNVO® 
ATTTTTGTTTTAACCTCACACAATGTIGTTTCTCTCGTAGCGCCAACATTGIGTCCACAACAAACCTTTTCTAGGTT TGTAAATT TAAGACCTAATGTAA 


vVPECFRr VNNTITPLYHOEVGKQKRTTVQGPPGESEK SH 
TGGTACCTGAATGTTT TGTAAATAACAT TCCACTT TACCATT TAGTAGGT AAACAGAAGCGTACT ACAGTACAAGG TCCTCCTGGCAGTGGTAAATCCCA 


FATGLAWVY*F SSARVVF TACSHAAYVYOALCEK AF K 
CTTTGCTATAGGCCT TGCAGTATACTTTAGTAGCGCTCGTGTTGTTTTTACTGCATGT TCTCATGCAGLTGTIGATGCT I TATGTGAAAAAGCTT I TAAG 


FerKVOODCTRIVPAQRTtTVY OCF S KF KAN ODOT GK K Y TF 
TTTCTTAAAGT TGATGAT TGCACTCGTAT AGT ACCCCAAAGGACTACTGTCGATTGCT TCTCAAAAT T TAAAGCTAATGACACAGGCAARAAGTACATTT 


STINALPEVSCOTItLLVOEVSMLTNY ELS FIN GK 
TTAGTACTATTAATGCCT TGCCGGAAGT TAGT TGTGATATTCTT¥ TGGTTGACGAGGT TAGTATGT TGACCAAT TACGAATTGTCCTTTATTAATGGTAA 


In Yqyvyvvyv¥GOPAQLPAPRTLLNGSLSPKOY NV 
GATAAATTACCAATATGTTGTGTATGTAGGTGATCCGGCTCAAT T ACCGGCACCCCGCACTTTACTTAATGGT TCACTT TCTCCAAAGGATTATAATGTT 


VTNLCMVCVK POTF LAK EC YRCECPKETV OOTY S TLV Y D 
GTCACAAACCTTATGGTT TGTGTTAAACCTGATAT TT TCCT TGCAAAGTGT TATCGT TGTCCTAAGGAAAT TGTAGACACTGTGTCTACTCTTGTTITATG 


G KF IT ANNPES R ECF KVIVNNGNSOVGHES GS A Y¥ 
ATGGAAAGTTTATTGCAAATAACCCAGAATCACGTGAGTGTTTCAAGGTTATAGTTAATAATGGCAAT TCTGATGTAGGACATGAAAGTGGT TCAGCETA 


NT TQtLEFPFVK OF VORNK QWREATIF IS PYNAMN GR 
CAACACAACACAATTGGAATTTGTGAAAGACTTTGTT TGTCGCAATAAACARTGGCGGGAAGCAATAT TTATTTCACCT TACAATGCTATGAACCAGAGA 


AYRMLGLNVQTVYOSSQAGSEYODYVIF CW TAOS QHA 
GCTTACCGTATGCTTGGACT TAATGT TCAAACAGTAGATTCTICTCAAGGT TCAGAGTATGATTATGTCATCTICTGTGT TACTGCAGAT TCGCAGCATG 


LN INRFNVALTRAKRGIItCVVmMRAQROELY SAL K F 
CACTGAATATTAATAGATTTAATGTGGCGCTTACAAGAGC TAAGCGTGGTATACTAGT TGTCATGCGCCAGCGTGATGAATTGTATTICTGCTCTTAAGTT 
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TELOSETStLAGTG&EtLF K ICN K EF S GVHPAY AV TT 
TACAGAGCTAGATAGTGAAACAAGTCTGCAAGGTACAGGTTTGTTTAAAATT TGCAACAAAGAATTTAGTGGIGTCCATCCTGCT TATGCAGTCACAACT 


K AL AAT YK VN DEL AACW NYE AG SETTY KHLEI SEL 
AAGGCTCTTGCTGCAACCTATAAAGT TAATGATGAACT TGCTGCACT TGTTAATGTGGAAGC TGGT TCAGAAATAACATATAAACATCTTATITCTICTGT 


GFK MSVNVEGCHN MF ITROEATRNVRGWwW VY GF OD 
TAGGATTCAAGATGAGTGTTAATGT TGAAGGC TGCCACAACATGTTTATARCACG TGATGAGGCAATCCOCAATGTAAGAGGT TGGGTAGGTTTTGATGT 


EATHACGTNIGTNLPF QVYGFSTGADF VV TPE GEL 
AGAAGCAACACATGCTTGTGGCACTAACAT TGGTACTAACCTGLCT TTTCAAGTAGGT T TCTCTAC TGGTGCAGACTTTGTAGTCACGCCTGAGGGACT T 


VO0OTS IT GNNFEPVWVNSKAPPAGEQFNHLELRVLF KS A K P 
GTAGATACTTCAATAGGCAATAATTT TGAGCCTGTGAAT TC TAAAGCACC TCCAGGTGAACAATT TAACCACTTGAGAGTGTTATT TAAAAGTGCTAAAC 


wW HV IT RPRIVQmMmLAONLENVSDCVWVF V TW CHGLE 
CTTGGCATGTTATAAGACCAAGGATAGTGCAGATGT TAGCAGACAATCTATGCAACGTTTCAGAT TGTGTAGTGT TTGTCACATGGTGTCATGGCCTAGA 


LTTLECRYFWKIGKEQvc S&S CGSRATTFONSHTAQAY A 
ACTAACTACTTTGCGCTATTTTGTTAAAATAGGCAAGGAACAAGTTTGTTICTTGTGGT TCTAGAGCTACAACTTTTAATTCTCATACTCAAGCTTATGCT 


cwuk HCLGFODBFVYYNPLLVODIAQAWEYSGNLAFNHODL 
TGTTGGAAGCATTGTTTGGGTTTTGATTTTGTTTATAACCCACTTCTAGTGGATAT TCAACAGTGGGGT TACTCGGGTAACCTACAGTTTAATCATGATT 


HC NVHGHAHVAS V¥OATOMTREOLAITN NA PF CODY NW 
TGCACTGTAATGTGCATGECCACGCTCATGCTAGCT TCTGTTGACGCTATAATGACTCGT TGICT TGCAAT TAACAATGCATTTTGTCAAGATGTCAACTG 


DLTYPHTANEODEVNS SCRYLQRMYLNACKCVW OAL K 
GGATTTGACATACCCTCACATTGCAAATGAGGATGAAGTCAATTCTAGT TGTAGATATCTACAACGCATGTATCTTAATGCGTGTGTIGATGCTCTTAAA 


VNWWYOTGNPKGITIKk« CVRRGOVN FRF YOK NPIWVRA 
GTTAATGTTGTCTATGATATAGGCAACCCTAAAGGTAT TAAATGTGT TAGGCG TGGGGATGTTAATTTTAGAT TCTATGATAAGAATCCAATAGTACGCA 


VK @FEYOYNQHK OK FADGLCMF WNONV DC Y POD N 
ACGTCAAGCAGTT TGAGTATGACTATAATCAGCACAAAGATAAGTT TGCTGATGGTCTTTGTATGT TT TGGAATTGTAATGTGGATTGTTATCCTGATAA 


StLVCRYODTRNLSWFNLPGCNGGESLEYVNKHAPFP YT 
TICCTTGGTTTGTAGG TATGACACACGAAATT TGAGTGTGTT TAACCT ACCAGGCTGTAATGGTGGTAGTCTGTACGT TAACAAACATGCATTCTACACA 


PK F DRISFRNLKAMPPR PF FY OS SPCETIQVOGYV AQ 
CCTAAATTTGACCGCATTAGCTTCCGCAATT TGAAAGCTATGCCATICTTTTTTTATGACTCATCGCCT TG TGAAACCAT TCAAGTGGATGGAGT TGCGC 


DLYSLATKOCTITtKCNIGGAVCKKHAQMY AEF VT 
AAGACCTTGTGTCTCTAGCTACGAAAGACTGTATCACAAAGTGCAACAT TGGTGGCGCTSTTTGTAAGAAACA TGCCCAGATGTATGCAGAATT TGTGAC 


SYN AAV TAGFTFwW VY TN KEN PYNLWK SF SAL QS I 
TTCTTACAATGCAGCTGTCACAGCTGGCTTTACTT TCTGGGTAACTAATAAACT TAACCCT TATAACT TATGGAAARGTITTTCAGCTCTCCAGTCTATC 


ON ITLAYNM YK GGHYOATAGEMPTYVYITGOK VF VIO @ 
GACAATATTGCTTATAATATGTATAAGGGTGGTCATTATGATGCTAT TGC TGGAGAAATGCCCACTGTCATAAC TGGAGACAAAGT TTT TGTTAT TGATC 


GveK AVF VN QTTLELPTSWAPF EL YAK RNIRTLP NN 
AAGGTGTAGAAAAGGCAGTTTTTGTTAATCAAACAACTCTACCTACATCTGTGGCGTT TGAGC TATATGCAAAGAGAAATAT TCGCACACTGCCAAACAR 


RILCKGLGYVOVTNGFYIWdDYANQATPLYRNANTYV KV EC 
CCGTATTTTGAAAGGTTTAGGTGTAGACGTAACCAATGGAT TTGTAATT TGGGAT TATGC TAACCAAACACCATTGTATCGTAATACCGTCAAGGTATGT 


AYTODIEPNGLVY¥VLYDORYGODYQAQSFLAADNAWL VS 
GCATATACAGATATTGAGCCAAATGGCCTAGTAGT TCTGTATGATGATAGATATGGIGATTACCAGICTTTICT 1GCTGCTGATAATGCIGTICTAGTTT 


TacyYyK«k R¥YS ¥VETPSNLLVQNGMPLEKOGANLY VY 
CTACACAGTGTTATAAGCGATATTCATACGTAGAAATACCATCTAAT T TGCTCGTTCAGAATGGTATGCCAT TAAAAGATGGAGCGAACCTGTATGTITA 


K RVN GAP VY TLEPN TIN TQGRS YETFEPRS OTF E RO 
TAAGCGTGTTAATGGTGCGTTTGTTACACTACCTAACACAATAAACACCCAGGGTCGAAGT TATGAAACTTTTGAACCTCGTAGTGACATTGAGCGTGAT 


FLAM™ S EES F VERY GKDLGLtLQAHTtLY GEvVoOK PAL GG 
TTTCTCGCTATGTCAGAGGAGAGTTTTGTAGAAAGGTATGGTAAAGACT TAGGCCTACAACACATACTGTATGGTGAAGT TGATAAGCCCCAATTAGGTG 


LHTVIGMmYRLLRANKLENAK SVY¥TNSOSOVMQN Y F 
GTTTACACACTGTTATAGGTATGTACAGACTCTTACGTGCGAATAAGT TGAACGCAAAGTCTGTAACTAAT TCGGAT TCTGATGTCATGCAAAATTACTT 
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Vb S&S ON GCS YK OY CFP UV DLL LOB FL ELL RAW TL KEY 
19301 TGTATTGTCGGACAATGGTTCTTACAAGCAAGTGTGTACTGTTIGTGGATT TACTGCTTGATGATTTCTTAGAACTICTTAGARACATACTTAAGGAGTAT 19400 


GTNKSKVVTVS TDYHSTINFMATWF EODOGSIKTCYPA 
19401 GGTACTAATAAGTCAAAAGTTGTAACAGTGTCAAT TGATTACCATAGCATAAATTTTATGACTTGGTTTGAAGATGGCAGTATTAAAACATGTTATCCAC §=©19500 


Cas Aw TCE YNMPELYKVQNCV¥MEPCNIPN Y GY G 
19501 AGCTTCAATCAGCATGGACGTGTGGTTATAATATGCCTGAACTTTATAAAGTTCAGAAT TGTGTTATGGAACCTTGCAACATICCTAATTATGGTGTTGG 19600 


ITLeEPSGILCMNVAK YTAQLCASYLSKTITICVWPHNMR 
19601 AATAACGTTGCCTAGCGGTATTCTTATGAATGTGGCAAAGTATACACAACTTTGTCAATACCTTTCGAAAACAACAATTTGTGTACCGCATAACATGCGA § © 19700 


VM HF GAGSOKGYWYAPGESTVELKAWLPEGCTLULVONDI 
19701 GTAATGCATTTCGGAGCAGGAAGCGACAAAGGAGTGGCGCCAGGTAGTACTGTTCTTAAACAATGGCTCCCAGAAGGGACACTCCTTGTCGATAATGATA 19800 


VOYVS OAHVSVWVLSOCNKYNTEHK FOLVI SOM Y T 
19801 TTGTAGACTATGIGICTGATGCACATGTTTCTGTGCTTTCAGATTGCAATARATATAATACAGAGCACAAGTTTGATCTTGTGATATCTGATATGTATAC 19900 


ONODSKRKHEGVIANNGNOODWVFIYLULS SFLELRNNLA 
19901 AGATAATGATTCARAAAGAAAGCATGAAGGCGTGATAGCCAATAATGGCAATGATGACGTTTTCATATATCTCTCAAGTTITCTTCGTAACAATTTGGCT 20000 


LbGSFAVKVTETSWHEVLYDIAQGODCAWW TMF CTA 
20001 CTAGGTGGTAGTTTTGCTGTAAAAGTGACAGAGACAAGT TGGCACGAAGTTTTATATGACATTGCACAGGATTGTGCATGGTGGACAATGITTTGTACAG © 20100 


V NAS SSEAPFLIGVNYLGASEKVKY¥SGK TLHAN Y 
20101 CAGTGAATGCCTCTTCTTCAGAAGCATTCTTGAT TGGTGTTAATTATT TGGGTGCAAGTGAAAAGGTTAAGGTTAGTGGAAAAACGCTGCACGCAAATTA ©. 20200 


IF wuRNCNYLQTSAYSTF OVAK FDLRLELKATPYV VN 
20201 TATATTTTGGAGGAATTGTAATTATTTACAAACCTCTGCTTATAGTATATTTGACGTTGCTAAGTTTGATTTGAGATTGAAAGCAACGCCAGTIGTTAAT 20300 


LK TE QKTOLVFNLIKCEKLLEVRODOVGNTSF TS OS F 
MLV TPL Lb YT Lk 
20301 TTGAAAACTGAACAAAAGACAGACTTAGTCTTTAATTTAAT TAAGIGTGGTAAGTTACTGGTAAGAGATGTTGGTAACACCTCTITTACTAGTGACTCTT 20400 


vc TM * 
LcALCSAVLYDSSSYVYYY¥QSAFRPPSGWHLQG 
20401 TTGTGTGCACTATGTAGTGCTGTTTTGTATGACAGTAGTICTTACGTTTACTACTACCAAAGTGCCTTCAGACCACCTAGTGGTIGGCATTTACAAGGGG © 20500 


Fig. 2. The sequence of the ‘unique’ region of mRNA F from the Beaudette strain of IBV. Translations 
of the ORFs are shown in single-letter amino acid code. The amino acid is shown above the first base of 
the appropriate codon. The translation starting at position 20368 is the NH, terminus of the spike 
precursor protein. 
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Fig. 3. Diagram showing the positions of the main ORFs in the ‘unique’ region of mRNA F. The two 
large ORFs, designated Fi and F2 are shown, as well as a small ORF at the 5’ end of the genome, and 
the start of the spike precursor gene, which overlaps with F2. 


The second large ORF, F2, extends into the ‘unique’ region of mRNA E and in fact overlaps 
the coding sequences for the spike protein gene by 16 amino acids. 


Potential sources of error 


All the sequence information has been confirmed by sequencing M13 clones obtained from 
both strands of the DNA. In addition most of it has been sequenced several times from different 
M13 clones. The 14 cDNA clones used to obtain the sequence of mRNA F contain, including 
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overlaps, 24765 bases. During the shotgun sequencing of these clones 203113 bases have been 
sequenced, so that each base has, on average, been sequenced 8-2 times. However there are two 
regions we have checked more carefully. The first is at positions 12340 to 12390 where F1 ends 
and F2 begins. An error here leading to a frameshift could make the difference between two 
large ORFs and one very large ORF. The second is at position 167 where the very small 11 amino 
acid ORF ends. A frameshifting error here could mean that this first ORF can continue for 
another 77 amino acids until position 397. There are two possible sorts of error. The first is an 
artefact in the sequencing gels leading to a misreading. The sequence on both strands appears 
perfectly clear in both these regions. Both regions have been sequenced using formamide gels, 
high temperature gels, in addition to the use of deoxyinosine triphosphate (Bankier & Barrell, 
1983) or deoxy-7-deazaguanosine triphosphate (Mizusawa et al., 1986) to replace deoxyguano- 
sine triphosphate and cytosine-modified sequence reaction products (Ambartsumyan & Mazo, 
1980) to avoid gel compressions. 

The second potential source of error is either a reverse transcriptase error during the synthesis 
of the cDNA or the occurrence of a mutant RNA molecule from which the cDNA was copied, 
both of which would lead to an incorrect cDNA clone. In the case of position 167 the sequence 
has been obtained from an equivalent clone from the M41 strain of IBV and is identical. In the 
case of the sequence between F1 and F2 the sequence has been confirmed from two additional 
independent cDNA clones, by sequencing directly from the double-stranded DNA using an 
oligonucleotide primer (Korneluk et a/., 1985). Fig. 4(a) shows the relevant sequence in this 
region and Fig. 4(5) shows a sequencing gel of bases 12333 to 12390 obtained directly from a 
cDNA clone using an oligonucleotide primer. In addition the sequence has been obtained 
directly from the virion RNA using specific oligonucleotide primers at both of these points and 
has confirmed the original gel readings. At positions 12333 to 12390 the sequence has also been 
obtained from virion RNA obtained from the M41 strain of IBV, and the sequence in this region 
is identical. 

Gel compressions are thought to be caused by the presence of hairpin loops in the DNA 
migrating down the gel. Examination of the sequence in these regions shows that there are 
several possibilities for the formation of fairly large hairpins, including for example, at the 
position between F1 and F2, the sequence GGGGTA with its exact complement TACCCC 24 
bases further on. At this position (12380), in the region where the reading frame changes 
between Fl and F2, the sequence has been determined from ten separate M13 clones. It is 
interesting to note that one of these clones gave a different sequence reading in that a CT 
dinucleotide, which appears in the other nine M13 readings, was not present. This is unusual as 
normally all independent M13 clones agree. It is possible that the secondary structure in this 
region has some effect on the fidelity of copying by polymerases. 


Computer analysis 


Extensive computer analysis has been carried out in an attempt to identify some salient 
features on the bleak landscapes of these large ORFs. Searches for homologies with other viral 
polymerases have been performed using the NBRF protein identification resource (George et 
al., 1986). Short regions of fairly low homology with several viral polymerases can be identified 
but in general they do not rise significantly above the background of matches with proteins that 
are apparently unrelated. One region, between amino acids 1342 and 1350, has a fairly good 
match (8/9 amino acids) with the nsP2 protein of Sindbis virus, a protein which is known to be 
involved in RNA replication (Strauss & Strauss, 1983). This region also has a match with the la 
protein of brome mosaic virus. These matches are shown in Fig. 5. One of the most interesting 
matches is at the 5’ end of the first large ORF. The first 300 amino acids have a low-level but 
extensive homology with the replication initiation protein from Escherichia coli (Germino & 
Bastia, 1982). The homology is statistically significant and it may indicate that this region of the 
polymerase protein is involved in initiation of replication of either the positive or negative 
strands. 

The predicted amino acid sequences of the large ORFs have been compared against 
themselves and against each other to see whether there are any repeats which might represent 
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Fig. 5. Comparison between amino acid sequences of brome mosaic virus (BMV), infectious bronchitis 
virus (IBV) and Sindbis virus (SV). The BMV sequences are amino acids 748 to 838 of the la protein. 
The SV sequences are amino acids 785 to 878 of the nsP2 protein, The IBV sequences are amino acids 
1248 to 1356 of F2. A colon shows identical amino acids and a dot shows similar (Kanehisa, 1982) 
amino acids. The dashes in the sequences are blank characters inserted to achieve optimal alignment. 


two separate but similar polymerases. A dot matrix comparison, such as DIAGON (Staden, 
1982a), reveals no repeats. However several low homology repeats can be detected using the 
program FASTP (Lipman & Pearson, 1985). These are shown on Fig. 6(a) beneath a 
hydrophilicity plot (Kyte & Doolittle, 1982) of the amino acid sequences of F1 and F2. Fig. 6(b 
to e) shows the amino acid matches in these regions. The spacing between the repeats marked A 
and B is very similar in both cases, 1157 amino acids in Fl and 1183 amino acids in F2. It is 
possible that these represent residual domains of homology between two polymerases which 
were at one time more closely related. The areas marked C and D also show regions of homology. 
The diagram also shows several very hydrophobic regions in the first large ORF which represent 
potential membrane-spanning domains. 

Computer analysis has also detected a homology between the non-coding region at the 5’ end 
of the positive strand, and the 5’ end of the negative strand (i.e. the reverse complement of the 
non-coding region at the 3’ end of the positive strand). This is shown in Fig. 7. These sequences, 
on the positive and negative strands, are approximately the same distance from their 5’ ends, 52 
bases and 48 bases [excluding the poly(A) tail] respectively, and may play some role in the 
replication of the positive and negative strands. 


Homology regions 


At position 599 the sequence CTGAACAA occurs. This is identical to the sequence which 
occurs in the ‘homology regions’ at the 5’ ends of the bodies of mRNAs D and E (Boursnell et ai., 
19855; Binns et al., 1985). These sequences are thought to be recognition sites for binding of 
the polymerase/leader complex during the synthesis of the subgenomic RNAs (Baric et al., 
1983). The same sequence CTGAACAA occurs at position 3293. Neither of these positions are 
known to be situated at the 5’ end of an mRNA species as are all the other homology regions. 
We have attempted to determine whether there is some feature of the sequence context 
surrounding these homology regions which sets them apart from homology regions which are 
known to occur at the 5’ end of the bodies of mRNAs. Accordingly, a consensus sequence has 
been calculated from the sequences surrounding the known homology regions at the ends of 
mRNAs A to F. This consensus sequence includes six bases to the left of the core homology 


Fig. 4. (a) The nucleotide sequence in the region between F1 and F2, with a translation in single-letter 
amino acid code of three reading frames. The amino acid is shown above the second base of the 
appropriate codon. Stop codons are marked as asterisks. The frames which are open in F1 and F2 are 
underlined and the methionine at the start of F2 is boxed in. (6). A DNA sequencing gel obtained by 
sequencing a double-stranded cDNA clone using an oligonucleotide primer. The sequence shown is 
from 12333 to 12390, and is the reverse complement of the sequence shown in (a). (c) The same three 
reading frames as shown in (a), with a graph for each showing the extent to which that reading frame 
conforms to the codon usage found for the amino acid sequence of Fl and F2. The frame which 
conforms best to the F1/F2 codon usage is marked with a series of dots and marked F1 or F2. Stop 
codons are marked as short vertical lines along the centre of each frame, and start codons as bars with 
filled-in circles on top. The two stop codons at 12339 (TAA) and 12382 (TGA) are marked as is the start 
codon at 12459. The program used is the ‘codon usage’ option from ANALYSEQ (Staden, 19844, 
1983) and uses the method of Staden & McLachlan (1982). The parameters used were a window length 
of 25 and an output length of 1. (Codon usage analysis from the spike, membrane and nucleocapsid gene 
data gives a very similar result.) 
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Hydropathicity 
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(b) Repeat A 


F1 484 EFVKTYVCKAQMSIVILAAVLGEOTWHLYSQVIYKLGVLF TK VYOF C--—-DKHWKGFCVQLKRAKLIVTE 


Py go: 
TIS ee sr ooe odew 0 06 oo8 Fon 2 woedh o 


F2 1387 EFVKDFVCRNKQW---REAIF—ISPYNAMNGQRAYRMLGLNVQTVDSSQGSE YDYVIF CVT ADSQHALNIN 


Fl TFCVLKGVAQHCF QLLLDATHSL YKSF KKCALGR---IHGOLLF 


ee PY tt ry 3 e. fs 
$3 0 See © coe @ atdoned § oSee eese 38 


F2 RFNVALTRAKRGILVVMRQRDEL YSALKF TELDSE TSLQGTGLF 


(c) Repeat B 
Fl] 1630 VKMGOKIGGVTMGLWRAEHLNKPNLERIFNIAKKALVGSSVVTTQCGKLIGKAATF IADKVGGGYVRNITD 


F2 2570 VKVSGKTLHANYIFWRNCNYLQTSAYSIFOVAKF OLRLKATPVVNLK TEQK TDL VFNL IKCGKLLVROVGN 


(d) Repeat C 


Fl 3696 VKTKACVAGVOQAHCSVESKCYY TNISGNSVVAAT TSSNPN---———| LKVASFLNEAGN---QT 


tt 3 st 
srg oat tt 


$ Soest o 


2088 ay . . 
SSeesee Soe 3 Bae e veer oe ofe¢ 


F2 1996 VKPTAYAYVVOEA-CLVODF VNLKYKAATPGKDSASSAVKCFSVTOFLKKAVFLKEALKCEQI 


(e) Repeat D 


Fl 3438 LFCIDSTIDLSE-YCDDILKRSTVLOSVTQEF SHIPSY AE YERAKNL YEKVLDSKNG--GVT 


2 : ry es. ‘ 
FS ecnewese Sie Seose eed 


F2 430 LFCLEVTSKYFECYEGGCIPASQUVVNNLDKSAGYP-FNKFGKARLYYEMSLEEQOQLFEIT 


Fig. 6. (a) Hydropathicity plots (Kyte & Doolittle, 1982) of the predicted amino acid sequences of 
ORFs F1 and F2. Values above the line are hydrophobic and values below the line are hydrophilic. The 
hydropathicity is calculated using a moving window of 41 amino acids, with a value plotted every 21 
residues. The pairs of bars marked A, B, C and D show regions of partial homology [see Results and (b) 
to (e)]. (b to e) Amino acid sequences of the matches depicted by the bars in (a). A colon shows identical 
amino acids and a dot shows similar (Kanehisa, 1982) amino acids. The dashes in the sequences are 
padding characters inserted to achieve optimal alignment. 


region CT(T/G)AACAA present in all the regions, the eight bases of the core homology itself, 
and four bases to the right. The consensus has been compared to the complete sequence using 
the computer program FITCONSENSUS (Devereux et ai., 1984). The program successfully 
identifies the known homology regions with scores ranging from 74-6 to 64-1. The 14 next best 
fitting regions identified have a-range of scores well separated from those of the known 
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48 TTARAGTTARETTAA---ACTAAAATT —TAGCTCT CCD TAATGGECGTCLTAGTGCTGTACCLT 109 


Fig. 7. Comparison between (top) the nucleotide sequence of the 5’ end of the genome and (bottom) the 
reverse complement of the 3’ end of the genome (i.e. the 5’ end of the negative strand). Colons show 
identical bases. The dashes in the sequences are padding characters inserted to achieve optimal 
alignment. 
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Fig. 8. Nucleotide and predicted amino acid sequences where ribosomal frameshifting may occur. The 
top sequence is at the F1/F2 junction of IBV, and the bottom sequence is at the gag/pol junction of Rous 
sarcoma virus. Colons show identical bases. 


homology regions, with a tight cluster of scores (53-6 to 58-8). The CTGAACAA sequence at 
position 599 scores even lower. It seems probable, therefore, that the two CTGAACAA 
sequences at 599 and 3293 are chance matches with the core sequence, but when surrounding 
sequences are taken into account the differences are enough to ensure that they are not major 
sites for the binding of the leader/polymerase complex. 


DISCUSSION 


The 20500 bases of sequence presented in this paper complete the sequence of the Beaudette 
strain of avian infectious bronchitis virus, the type species of the Coronaviridae. The complete 
sequence, excluding the poly(A) tail at the 3’ end, is 27608 residues. This is somewhat larger than 
the previously estimated size of the viral RNA which had been put at 20 to 24 kilobases 
(Lomniczi & Kennedy, 1977). The sequence of the ‘unique’ regions of mRNAs A, B, C, D and E 
have already been published, covering some 8 kilobases at the 3’ end of the genome and 
including the genes for the major structural proteins of the virus. The 20 kilobases at the 5’ end of 
the viral RNA constitutes the ‘unique’ region of mRNA F, the genome-sized RNA. This is 
thought to code for a polymerase or polymerases which carry out all the necessary replication 
and transcription functions of the virus. 

Sequence analysis shows that the main part of the ‘unique’ region of mRNA F appears to 
contain two large ORFs. Because of the importance of determining whether there are one or two 
ORFs, we have considered the possibility that mRNA F in fact contained one very large ORF, 
and that a sequencing error or a mutant cDNA clone had led to a frameshift. Because of this the 
sequence in the region between the two ORFs has been checked exceedingly carefully. The 
relevant sequence is shown, with translations in the three reading frames, in Fig. 4(a). Any 
frameshift error must occur within 43 bases between positions 12341 and 12383. Two 
independent cDNA clones and direct RNA sequences from virion RNA give the same result. 
There are no obvious signs of sequence artefacts such as compressions, and indeed several gel 
systems and sequencing methods which could resolve compressions (see Methods and Results) 
do not show any change in the sequence. Fig. 4(6) shows a sequencing gel representing this 
region, obtained by sequencing a cDNA clone directly using an oligonucleotide primer. It can be 
seen that the sequence appears clear and unambiguous. Unless, therefore, there is some singular 
form of unresolvable and undetectable sequencing artefact, we must accept that the sequence 
here is correct. 

The problem now arises as to how translation of the second ORF, F2, is achieved. NomRNA 
has been detected at this point, and no homology region which might suggest the presence of one 
can be seen in the RNA sequence (see Results). It is possible that the ribosomes, having 
completed translation of the first ORF, F1, reinitiate translation at the first AUG of F2, or that 
internal initiation occurs, as appears to be the case with the phosphoprotein mRNA of vesicular 


74 M. E. G. BOURSNELL AND OTHERS 


stomatitis virus (Herman, 1986). There is however one piece of evidence that suggests that 
neither of these alternatives is the case. If the second ORF is genuinely a separate gene, then the 
70 or so bases preceding its initiation codon should be non-coding sequences, comparable to the 
5‘ non-coding sequences preceding other IBV genes. In fact, if translated, they exhibit a heavy 
codon bias (Staden & McLachlan, 1982; Staden, 1984c) similar to the bias found in other IBV 
genes. This is shown graphically in Fig. 4(c) where it can be seen that the frame with typical IBV 
codon bias switches from that of F1 to that of F2 exactly at the point where the ORF changes. 
This strongly suggests that the sequences before the AUG of F2 have a coding function. One 
way to resolve this problem is to postulate that on some occasions, during translation of mRNA 
F, a ribosome slippage occurs, which introduces a frameshift and allows translation to continue 
unhindered from F1 into F2. Ribosomal frameshifting has been described in bacteriophage 
(Kastelein et al., 1982), prokaryotic (Atkins e¢ a/., 1972) and eukaryotic (Fox & Weiss-Brummer, 
1980; Jacks & Varmus, 1985) systems. Such a mechanism could be conceived in the case of IBV 
as a form of translational control designed to provide coordinated expression of two 
polymerases, with the protein from the first gene being produced at a higher level than that from 
the second gene. In the case of Rous sarcoma virus (Jacks & Varmus, 1985) expression of the pol 
gene requires a frameshift by the ribosome. Some well-controlled work by these authors, using 
cell-free translation systems, has demonstrated that the frameshifting is sequence-specific. 
Moreover it occurs ten times more efficiently in a eukaryotic system than in a prokaryotic 
system, indicating that there are specific eukaryotic signals to which the prokaryotic system 
responds poorly. The region of sequence responsible for the frameshifting has been narrowed 
down to 24 nucleotides. Both IBV and Rous sarcoma virus require a shift into the —1 frame to 
occur, and it may be that similar frameshifting signals are present in both sequences. 
Accordingly the 24 nucleotides of Rous sarcoma virus sequence have been compared to the 43 
nucleotides of IBV sequence within which any frameshift must occur (see Fig. 4a). Interestingly 
a match of 8/9 nucleotides can be found, both sequences occurring in the same frame and both 
within 20 bases of the termination codon (see Fig. 8). Further work will be needed to determine 
whether this sequence forms part of any signals which may promote ribosomal frameshifting. 

For each of the other IBV mRNAs, the first AUG to occur after the homology region either is 
used to initiate synthesis of a protein, as is the case for the spike and membrane proteins (Binns 
et al., 19856; Boursnell et al., 1984), or is present at the start of a reasonable sized ORF which 
could code for a polypeptide of 7K or more. Thus it is surprising to find the first AUG, at 
position 131, at the start of a small, 11 amino acid, ORF. The sequence context around this first 
AUG does not conform to Kozak’s consensus for functional initiation codons whereas the 
context round the second AUG does. A similar small ORF of 12 amino acids occurs at the 5’ end 
of RNA 1 of alfalfa mosaic virus (Cornelissen et al/., 1983), an RNA species encoding a 115K 
product thought to be involved in RNA replication. In this case also only the second AUG 
conforms to the Kozak consensus. Both these cases suggest the possibility that the ribosomes can 
bypass the first, non-functional, AUG and initiate translation at the second. It is likely that this 
also occurs in mRNA D of IBV to allow translation of the second and third ORFs (Boursnell et 
al., 19855). 

It is not known for coronaviruses whether the sequences at the 5’ end of the genome produce a 
polyprotein which is subsequently cleaved into separate proteins, as is the case for alphaviruses 
(Strauss et a/., 1984), or whether the viral polymerase acts as an extremely large multifunctional 
enzyme. Whether or not it is cleaved post-translationally into separate proteins, such an enzyme 
would need to perform several functions. First it must synthesize the negative-stranded 
template. From this template it must synthesize the leader sequence and then the subgenomic 
mRNAs, for which it needs the ability to recognize highly conserved signal sequences (Baric et 
al., 1983, 1985; Spaan et al., 1983; Brown & Boursnell, 1984), a capping ability (Lai et al., 1982) 
and probably the ability to reinitiate transcription at these points (Lai et al., 1985; Makino et al., 
1986). If it is cleaved into separate proteins it may encode a protease function to do this. Two 
polymerase activities, early and late, have been identified in MH V-infected cells (Brayton et al., 
1982). These have different ionic requirements and different pH optima. Both polymerase 
activities are associated with two different membrane fractions, a light fraction which appears 
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to synthesize positive-stranded genome-size RNA and a heavy fraction which also synthesizes 
subgenomic RNAs (Brayton et al., 1984). Some evidence for two polymerase-coding genes can 
be found in the nucleotide sequence of mRNA F, in that there are small regions of residual 
homology between the predicted amino acid sequences of F1 and F2 (see Results and Fig. 6). 

The question of whether the cDNA clones sequenced in this study might derive from mutant, 
non-viable RNA molecules is an interesting one. The error rate of RNA polymerases is fairly 
high (Steinhauer & Holland, 1986) and many of the RNA molecules in an infected cell may be 
different from that in the original infecting virus. If the mutation rate is 1 in 10000 then over the 
20 kilobases of sequence presented here, there may be one or two changes each time one strand 
was copied into another. While the viral RNA is replicating within the cell, it is likely that 
mutant, and possibly defective, virion RNA molecules will accumulate with little selection 
against them, and, unless they have gross structural defects, most of them will be packaged into 
virions. It is these virions, without any further selection for viability, which are used to extract 
the RNA which is used to synthesize cDNA. In addition the infecting virus will be a mixture of 
different RNA molecules, even though it has been plaque-purified. However, be that as it may, 
there is no evidence for very high mutation rates in the cDNA clones which we have sequenced 
here. For the clones covering the 20 kilobases there are 4659 bases of overlap between separate, 
independent clones (all made from the same RNA preparation). In the overlap regions there was 
not one difference, there being 100% agreement between the sequences from adjacent clones. 

This is in contrast to results found by Schubert et a/. (1984) while sequencing the polymerase 
gene of vesicular stomatitis virus. The gene spans 6380 nucleotides and each region was 
sequenced from approximately three cDNA clones, giving 19140 nucleotides of overlap. In 
these 19140 nucleotides they found 20 nucleotide changes, including four insertions or deletions, 
giving an overall mutation rate of approximately 10-3. In the 9318 (4659 x 2) nucleotides of 
IBV cDNA clones which can be checked on another clone, there were no changes. Over 9318 
nucleotides a mutation rate of 3-2 x 10-* would give a 95% probability of at least one 
nucleotide change; thus, since there were no changes, the overall mutation rate is probably lower 
than this. Given the number of rounds of replication which will have occurred between the 
original plaque isolation and the production of the cDNA clones, the mutation rate per base 
incorporated is likely to be considerably lower than this. It is interesting to speculate on the 
disparity between the vesicular stomatitis virus and the IBV results in this case, and on whether 
the (presumably) very large IBV polymerase, or polymerases, has a lower intrinsic error rate 
than the VSV polymerase. 

Sequencing of cDNA clones from the ‘unique’ region of mRNA F has revealed the rather 
unexpected presence of two large ORFs. Although the sequence in the region between these has 
been obtained from three independent cDNA clones and from the virion RNA, the possibility 
of some bizarre form of sequence artefact cannot be totally discounted. It will be interesting to 
see if a similar frameshift occurs in an equivalent position in the coronavirus MHV genome. 
Experiments can now be designed to confirm the reading frame switch by other means. For 
example in vitro translation of SP6 polymerase transcripts from this region can be performed and 
the sizes of the products determined. Although no mRNA has been detected with a 5’ end near 
the beginning of the second ORF, a search for a low abundance mRNA species can now be 
carried out by primer extension from mRNA preparations. In addition, the availability of 
sequence data from the IBV polymerase(s) allows antisera to be raised against products 
expressed from selected parts of the sequence. These will prove useful in determining the fate of 
the large polypeptides predicted from the nucleotide sequence, showing whether post- 
translational cleavage occurs, and attempting to unravel the relationship between the various 
polymerase activities which have been detected in coronavirus-infected cells. 
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